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The sigmoidal tuning curve that maximizes the mutual information for a Poisson neuron, or 
population of Poisson neurons, is obtained. The optimal tuning curve is found to have a discrete 
structure that results in a quantization of the input signal. The number of quantization levels 
undergoes a hierarchy of phase transitions as the length of the coding window is varied. We postulate, 

. using the mammalian auditory system as an example, that the presence of a subpopulation structure 

• within a neural population is consistent with an optimal neural code. 

o : 

^ , PACS numbers: 87.19.1s,87.19.11,87.19.1t,87.19.1o 

' Neuronal responses often appear noisy in the sense that repeated presentation of identical stimuli result in variable 
action potential timings. This variability is often closely modeled by Poisson statistics P, 0] and, hence, the Poisson 
\l ' neuron has become an archetypal model for neural rate coding. In this model the input signal x is coded in the 
^ , mean firing rate v = g{x) where g{x) is known as the tuning curve (or the stimulus-response curve, gain function or 
I— I ' rate- level function) . While several definitions of rate exist Q , following related studies [1, 0, @] j here we assume the 
I observable output when the mean rate is ly is the number of spikes, k, that occur in a time window T. The input x is 
assumed to be a continuous variable, such as an external sensory stimulus. 

Despite the popularity of the Poisson neural model, remarkably the 17(2;) that maximizes Shannon mutual informa- 
tion ?, ^ has not been obtained, except in the limit T — > 00 0, 0- Arguably, this limit is not relevant to a large 
number of biological sensory systems where it is well established that behavioral responses occur on timescales that 
^H' imply short coding windows 2]. In this letter we obtain the optimal tuning curve for finite T. 

I— I, Our main finding is that the optimal tuning curve is discrete, in the sense that many stimuli values result in the 
same mean firing rate. The number of discrete levels, M, increases as T increases. This result means that when 
mutual information is to be maximized, signal quantization is an emergent feature of the optimal coding scheme and 
is superior to analogue coding. We also demonstrate this means neural subpopulations might be necessary to optimize 
an overall population. 

This result of optimal Af-ary coding differs significantly from [1, Q , which predicts a single phase transition from 
' I . binary to continuous tuning curves — i.e. from discrete to analog coding. This difference is because we maximize 
0^ mutual information, while [5, 6] minimizes mean square error (MSE). We consider mutual information, rather than 
other metrics like MSE Q and Fisher information [10| , because it does not rely on assumptions about how a neuron's 
response may be 'decoded' 0. Furthermore, mutual information is intimately linked to MSE via rate-distortion 
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theory jll|. 



' derive the optimal tuning curve, we make use of a known result from the photonics literature. The properties 

, of Poisson neurons arc very similar to those of direct detection photon channels, which operate by modulating the 
intensity of a photon emitting source. Both can be modeled as Poisson point processes 1^. 
. 5^ , A classical problem in communication theory is that of finding the signal distribution that maximizes the mutual 
information for a channel. The resultant optimal codas said to achieve channel capacity The optimal input 
distribution for the direct detection photon channel has been proven to be discrete 13, 14|. Indeed, the discreteness of 
optimal input distributions is the norm, regardless of whether the output distribution is discrete or continuous 15, 16| . 
Consequently, although we have assumed a rate code based on the discrete random variable defined by counting spikes 
(alternatively we could have defined it in terms of a continuous random variable based on interspike intervals), the 
central result in this paper is not dependent on the definition of rate but is rather a property of the 'channel noise' — 



see 



The discreteness of the optimal signal for the optical Poisson channel implies that the optimal stimulus for a Poisson 
neural system is also discrete. However, this is not physically realistic, as the distribution of an external stimulus is 
not controlled by a neural system, and is likely to be continuous, e.g. speech, or natural sound statistics. Instead, it 
is plausible that a neural system may have been optimized by evolution so that the tuning curve discretizes its input 
to match the theoretical optimal source distribution. 

The mutual information 0, between the (continuous) input random variable x ^ X = [a^min, a;,nax], and the 
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(discrete) output random variable, k, is 



I{x;k)^Y.( dxP.A^)P[k\x]\og,^^, (1) 

where Pk{k) ~ J^^^ dxPx{x)P[k\x]. Here Px{x), Pk{k) and P[k\x] are the distributions of the stimulus, the response 
and the conditional distribution respectively. 

For Poisson statistics, the conditional distribution is 

Q[k\iy]^^—^exp{~Tiy), fc = 0,.,oo. (2) 
k\ 

The mean firing rate is restricted to v G [I'min, I'max], where the upper bound t'max is due to physiological limits 
(metabolic and refractory), and we set t'min = 0. Later, we use the notation N — Ti^^i^^ to denote the maximum 
mean spike count. 

The conversion of a signal follows the Markov chain x ly ^ k. Note that k is observable from a short duration, 
T, while v is not. We refer to a; — s- i/ and v k as separate 'subchannels.' To find the optimal channel, we maximize 
the mutual information by variation of the distribution P^{v) for given Px{x) and Q[fc|^]. Since the distribution of v 
is Pi/{i^) = J^^x dxPx{x)5{v — g{x)), where S{.) is the Dirac delta function, variation of Pv(y) means variation of the 
tuning curve g{x). 

We now present the following theorem: The mutual information in the neural channel, x v k, is maximized 
when the distribution Pu{v) is discrete. 

Remark. The neural channel forms a Markov chain for which the following equations are valid, 

P{x,iy,k) = Pki,[k\,y]P[iy\x]P^{x), (3) 
P{x,j,,k) = PkiM'^]P[jy\k]Pk{k). (4) 

We assume Pkiv^xiMi^: ^] — ^^'^ Px\iy,k[^Wj — PxIpIxIv] due to the definitions of the subchannels as v = g{x), 

i.e. Pa,|^[a;|//] = S{i' — g{x)), where g{x) is a single-branched function, and Pfe|j/[A:|i^] = Q[/c|i^]. 

Proof. First we prove that I{x; k) = I{v] k). From Theorem 5.2.8 of 7], the mutual information between the variable 
k and the pair (v, x) can be written in two ways, 

I{k;{iy,x)) = I{k;iy)+I{k;x\iy), (5) 
I{k;{v,x)) = I{k;x) + I{k;v\x), (6) 

where the conditional mutual information expressions are 

Px\u,k[x\v, k] 



I{k\x\i') = // dxdv P{x,iy, k) log2 

K J 1 n 



, - 1 1, [x 1 1^1 



I{k;v\x) = l^dxdiy^P{x,iy,k)log^ 



Since Pz:|i/,fe[a:|j^, fc] = Pa;|^[x|^], the conditional mutual information I^k-jxlv) is zero. Next, note that the variable v 
directly depends on the random variable x, v = g{x) and hence Pj^i^, j,[z/|a;, /c] — P^\x[iy\x] and the conditional mutual 
information I{k; iy\x) is also zero. Consequently, we are left with the following two equations, /(fc; {v, x)) — I{k; v) 
and I{k]{v^x)) = I{k]x). This means that I{k;x) = I{k;iy), and the mutual information in the neural channel is 
equal to the mutual information of the noisy subchannel. 

To proceed, we now consider the noisy neural subchannel ly '^^ ^ ^ > fc, and use a theorem from [3] for a 'direct 
detection' photon channels, where the input is a continuous time inhomogeneous Poisson rate, X{t). Due to 'bandwidth 
constraints,' X(t) < A is constant during equal durations A. The output is the sequence of photon arrival times within 



A, {ti}f^i . A key result in [ij, Section 3] is that this channel is mathematically equivalent to one where the output 
is the photon count y within A, and the input is a time-independent variable A. The latter channel's distribution is 
given by [21 except instead of counting y photons in response to A < A during A, the neural Poisson channel output 
is the spike count, k in response to v < VninK during T. Thus, I{X{t); {ti}^^i) — /(A; y) = I{v] k) = lik; x), and since 



141 . Theorem 1] states that the optimal distribution of A is discrete, the mutual information in the neural channel is 



also maximized when Pu{v) is discrete. □ 
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The proven theorem does not provide any means for finding a closed-form solution for the optimal discrete dis- 
tribution, P^{v). However, its utility is that it allows a reduction in the set of functions we need to consider when 
optimizing Pp{v) and/or the tuning curve g{x). 

Without loss of generality we can now introduce the following simplifying restriction for the hmction g{x). Let g{x) 
be a non-decreasing multi-step function 

M-l 

i=o 

where M is the number of levels and a{.) is the Heaviside step function. Letting pi = X]n=o 7" have 9i+i as the value 
of X at which g(x) jumps from value /3i to Pi+i- Since we assume Xmin = Oq < Oi < 62 < ■■■ < Om-i < Om = a^max, the 
optimal g{x) is unique. This latter requirement means that we consider only the case of monotonically non-decreasing 
(sigmoidal) tuning curves. Without this restriction it is not possible to find a unique solution and hence this study 
does not generalize to non-monotonic tuning curves. This is not highly restrictive, since sigmoidal tuning curves are 
widely observed in many sensory modalities • The mutual information of the neural channel can be written as 

/(:.; fc) ^ l"g2 ^ , ^ (8) 

fc=o »=o E„=o anQ[k\[in\ 

where = /^^'^^ dxP^ix). The optimal function g{x) cannot be easily found in an analytical form using variational 
principles, because it leads to a set of transcendental equations. Therefore we use stochastic gradient descent methods 
to solve for the optimal Pl,{v)- For alternative methods see e.g. [20| . 

Fig. [T] shows the main results of our study. The upper insets display the normalized optimal tuning curve, f{x) = 
g{x)/N, for four different values of maximum mean spike count, A'^. Fig. [2] shows the overall population normalized 
firing rates, = (3i/N, as well as the mutual information corresponding to the optimal solution. Note that I{x; k) in 
Eq. ([8]) is parameterized entirely by the set at, f3i , i = 0, .., M— 1, and it is these parameters that are optimized. The 
set of 6i-s required for the optimal g{x) can be obtained for any given Px{x) from the ai. Hence, in FigfTJ without 
loss of generality we have assumed that the stimulus is uniformly distributed on [0, 1]. Similarly, the 7^-3 follow from 
A. 

For small < 3, only two firing rates are observed; for values of x < 6*1, f{x) — (the absence of firing) while for 
larger values of a;, f{x) — 1 (firing at the maximum allowable spike rate). This form of optimal binary coding has 
been predicted previously for Poisson neurons using estimation theory 0, Q ■ It also agrees with the well known result 
that a binary source maximizes information through a Poisson channel when the input can switch instantaneously 
between states 13. [l8|. 



As N is increased, the number of steps in the optimal tuning curve increases; e.g. for N = 7, two steps are observed 
giving rise to a ternary coding scheme, for A'^ = 15 three steps are observed giving a 4-ary (quaternary) coding. In 
general, an M-ary code will be optimal with increasing N. As iV — > 00 we predict that the optimal tuning curve will 
converge to a continuous function 0, 0|- Fig. [1] shows how the partition boundaries, 0i, vary as N is increased; new 
boundaries can be seen to emerge via phase transitions. These appear to be continuous and hence are akin to second 
order phase transitions of the optimal tuning curve. 

Our findings of an optimal M-ary code are in agreement with isomorphic results on the information maximizing 



source distribution for Poisson direct detection photon channels with imposed bandwidth constraints [1J|. In our 



context, a bandwidth constraint is equivalent to allowing > 1. We further note that the bifurcation structure in 



Fig. [T] is qualitatively similar to information optimization results in ll|, [21| for systems that are quite different to 
Poisson neurons. 

One way of interpreting our results is that the steps in the optimal f{x) partition the stimulus into regions associated 
with neural subpopulations. For example, suppose an overall population consists of K neurons and M — 1 sub- 
populations, within which each neuron is identical, and binary with rates and 7i. Since the neurons are Poisson, the 
sum of the K individual normalized firing rates is equal to f{x). For overall binary coding, the only way of achieving 
fix) would be a single sub-population, where each neuron is identical, and able to fire at two rates, (f>o/K — and 
4>i/K ~ j^, where rate (jii/K is activated when x > 61. For the ternary case, there would be two subpopulations, of 
sizes J and K — J, with individual normalized firing rates 4>i/ J and (1 — 4)i)/{K — J), so that the overall population 
has 3 rates: 0, 4>i and 1, as shown in Fig. [2l The first subpopulation would only be activated when x > 9i and the 
second when x > 62- 

We can estimate the sizes of the subpopulations in our example as follows. Since the sizes of the subpopulations 
are proportional to the integrated firing rates, the neurons for ternary coding are distributed with probabilities 
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Pi = 4>i = li/N and P2 = \ — (t>i respectively. The quaternary coding scheme for A'' = 15 has three subpopulations 
with optimal individual firing rates proportional to 71, 72 and 1 — 71 — 72, and overall rates 0, 0i, (j)2 and 1. The 
sizes of the subpopulations are therefore oc 71 = N(l)i, P2 oc 72 = N[(f)2 ~ and P3 = 1 — Pi — P2, as shown in 
the lower insets in Fig. [T] 

Our results lead to two main predictions for information-optimal neural coding: (i) that the tuning curves are 
discrete; and (ii) that neural populations should form subpopulations. 

The prediction of subpopulations seems to have some correspondence with known results and we take, as an 
example, coding of sound level (at a fixed frequency) by the auditory system. Inner hair cells (the sensory receptors 
that transduce sounds into neural activity) are each connected to approximately 15 separate afferent nerve fibers [2^ . 
Physiological studies suggest that these fibers can be grouped into two or three subpopulations based on their threshold 
to sound level [ii,!!!]. 

The presence of three subpopulations would suggest a quaternary code (M = 4) is used. From Fig.[2l this is optimal 
when N ~ 15. If each afferent fires at a rate of ~ 100 spikes/s then the time required for the population to generate 
15 action potentials is ~ 10 ms. This timescale agrees with the classical temporal-window model proposed to explain 
the human auditory system's temporal resolution, in which temporal integration is performed by a sliding window 
with an equivalent rectangular duration of ~ 7 — 13 ms [25| . Hence, quaternary coding is consistent with the known 
parameters of auditory coding and perception. 

We now turn our attention to the prediction that optimal-information tuning curves should be discrete. This 
certainly seems to be inconsistent with physiologically measured tuning curves. However, evidence for binary tuning 
curves does exist — for a discussion see [5|. There may well be other reasons why they are not commonly observed 
in practice. For example, the steepness of the slope of the sigmoid is known to depend strongly on the nature of 



the signal |26j and measurement window 27|. Consequently, measuring the tuning curve with the 'right' stimulus 



(feature), x, and window duration may be crucial to observing a rapidly increasing curve that approximates a step 
increase. At present it is not clear if such experiments have been performed. 

Of course, it is also possible that neural sensory systems are not optimized for the transmission of information, 
although we note that minimization of MSE also leads to the prediction of binary tuning curves for short decoding 
windows 0]. Alternatively, the model we consider may need revision to make it physiologically more realistic. For 
example, it may be necessary to include other sources of noise and to take into account non-Poisson statistics. Realistic 
signal statistics will also need to be considered. Other constraints, such as metabolic penalty, may also be important, 
although this is not likely to change the conclusion that information-optimal tuning curves are discrete . We hope 
our results will inform future discussion on this topic and motivate further studies. 
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FIG. 1: The partition boundaries 9i against the maximum mean spike count, A'^ = Ti/max- Also shown for N = 2,7, 15 and 22, 
are the optimal f{x) (top insets) and the population distributions (bottom insets). The parameters are Xmin = 0, Smax = 1, 
t'min = 0, and X uniformly distributed. 




FIG. 2: The set of optimal firing rates, (jti, i ~ 0, .., M — 1, for the overall population with M ~ 1 subpopulations, as a function 
of A*' (top), and the mutual information for the optimal solution (bottom). The dashed line shows the mutual information that 
would result if binary coding were utilized . 



